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ABSTRACT  V/."  ^ _ < 


We  consider  identification  problems  for  members  of 
the  exponential  family,  applying  the  results  to  the  density 
of  the  logistic  model  for  life  table  data. 

For  this  model  we  want  to  know  the  following: 

1.  When  are  the  maximum  likelihood  estimates  for 
the  model  parameters  unique; 

2.  What  type  of  inferences  may  be  made  if  the  maxi- 
mum likelihood  estimates  are  not  unique? 

Noting  the  form  of  the  likelihood  for  the  logistic 
model,  question  1 is  considered  for  the  exponential  family. 

We  obtain  an  answer  for  question  1 in  this  context.  Applying 
this  answer  to  the  logistic  model,  we  find  that  a unique 
maximum  likelihood  estimate  exists  if  and  only  if  the  density 
is  in  one  to  one  correspondence  with  its  parameter  space. 

To  answer  question  2,  we  consider  members  of  the 
exponential  family  where  the  density  is  not  in  one  to  one 
correspondence  with  its  parameter  space.  As  a guiding 
example  of  such  a density,  we  consider  the  normal  linear 
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model  of  less  than  full  rank,  discussing  the  concepts  of 
estimable  function  and  testable  hypothesis,  which  have  been 
developed  for  this  particular  case.  We  then  show  that  the 
concept  of  uniform  identif iabi lity  is  a generalization  of 
the  concept  of  an  estimable  function.  Further,  through  the 
idea  of  an  identifiable  set,  we  extend  the  concept  of  a 
testable  hypothesis.  We  then  apply  the  resulting  theory  to 
the  logistic  model. 
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INTRODUCTION 


Thompson  (1976)  introduced  a logistic  model  for 
covariate  effect  in  the  analysis  of  grouped  life  times. 
Maximum  likelihood  is  proposed  as  a method  of  estimating 
the  parameters  of  the  model.  It  is  noted  that  the  likeli- 
hood function  is  both  concave  and  differentiable  everywhere 
on  the  parameter  space;  thus,  a point  is  a global  maximum 
if  and  only  if  it  is  a solution  to  the  likelihood  equations. 
However,  in  a numerical  example,  the  likelihood  equations 
are  linearly  dependent  and,  to  obtain  a unique  solution, 
a constraint  must  be  imposed  on  the  parameters.  This  causes 
a problem  in  the  application  of  the  logistic  model;  based 
on  the  same  data  two  statisticians  might  obtain  different 
maximum  likelihood  estimates  for  the  parameters.  One  might 
be  able  to  correct  the  problem  of  non-uniqueness  by  a 
reparameterization  of  the  model,  but,  in  doing  so,  the 
physical  meaning  associated  with  the  parameters  might  be 
lost;  thus,  it  is  desirable  to  know  the  following: 

1.  When  are  the  maximum  likelihood  estimates  for 
the  parameter  of  the  model  unique; 

2.  What  type  of  inference  may  be  made  if  the  maximum 
likelihood  estimates  are  not  unique? 

Noting  the  form  of  the  likelihood  for  the  logistic 
model  given  in  Thompson  (1976),  we  will  consider  questions 
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1 and  2 for  the  exponential  family  and  use  the  logistic 
model  as  an  example  to  illustrate  the  resulting  theory.  We 
first  consider  background  material  for  the  exponential 
family . 
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2.  SOME  PROPERTIES  OF  THE  EXPONENTIAL  FAMILY 
Let  p be  a o-finite  measure  on  and 

T 

(1)  p(y;a)  = exp  (a  y — <t»  (a)  ) 

be  a probability  density  funetion  v:ith  respect  to  p . 
Families  of  form  (1)  are  said  to  be  exponential  families 
with  natural  parameter  a , see  Lehmann  (1959). 

Now 


/ p(y;«)dp(y)  ^ 1 ; 


thus, 

(2)  <t>  (a)  ~ In  f exp  (a^y)  dp  (y)  , 


or  <f>  ( • ) is  the  log  moment  generating  function  of  p . 

Let  A - (a  | <{•(«)  <a'J»  then  A is  said  to  be  the 
natural  parameter  space  of  (1).  Lehmann  shows  that  A is 
a convex  set.  For  in  A^  , the  interior  of  A , the 

moments  of  (!)  can  be  obtained  from  (2)  by  differentiating 

T 

under  the  integral  sign.  In  particular,  if  Y = (Y^,...,Y  ) 
has  density  p(y;^^)  then 


A 


and 


cov (Y . ,Y . ) = - 


b2<t 


Let  | (a) 
Then 


1 j da . da . 

1 ] 1 a=a 


be  the  covariance  matrix  of  Y evaluated  at  a 


«•>  - • 

i 1 


A basic  property  of  the  exponential  family  is  that 
the  range  of  Y does  not  depend  on  the  choice  of  parameter. 

Lemma  1 . The  support  of  p is  the  support  of  P 
for  all  a . 

Proof.  Let 


pu(K ) = /K  p ( y ; a ) d p (y) 


r T 

P (K)  = 0 implies  / exp(a  y)dp  = 0 ; however, 

T 

exp(a  y)  > 0 for  all  y . Therefore,  P (K)  =0  if  and 
only  if  jj  ( K)  = 0 . 

At  this  point  we  need  two  preliminary  results  having 
indirect  implications  for  the  exponential  family. 

Lemma  2 . Let  $ be  a positive  semi-definite  matrix, 
trh^n  *g  = 0 if  and  only  if  g Jg  = 0 . 


Proof . $ may  be  represented  as 


T 

P AP  , 

where  P is  an  orthogonal  matrix  and  A is  a diagonal 
matrix  whose  entries,  db  , are  the  eigenvalues  of  J ; thus, 

T*  T T m 

g lg  = g p APg  = (Pg)*A(Pg)  . 

Hence,  letting  Pg  = V , we  have 

T P 9 

g $g  = 1 <5  . v^  . 

j-1  3 3 

T 

Let  g |g  = 0 . Now  6.  2 0 , j=l,...,p,  so  that 

6jvj  = 0 • 

$g  = PTAPg  = PTAV  = PT(6  v ...  6 v )T  = 0 . 

r i P P 

This  proves  the  "if"  part.  The  converse  is  obvious. 

We  will  denote  the  column  space  of  a matrix  M (the 
set  of  a linear  combination  of  the  columns  of  M)  by  Col (M) 
and  the  rank  of  M (the  number  of  linearly  independent 
rows  in  M)  by  Rank (M)  . 

Lemma  3 . Let  Y be  a random  vector  with  covariance 
k 

$ . Writing  L = \j  {y|b.y=c.}  for  the  smallest  linear 

i=  1 1 1 

manifold  containing  the  support  of  Y , then 
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Proof . Lot  S be  the  support  of  Y and 


★ - 


U (y I b . y = 0 } 
i=l 


We  will  show  that  g is  perpendicular  to  L*  if  and  only 
if  g is  perpendicular  to  Col($)  ; thus,  due  to  the 
uniqueness  of  the  orthogonal  complement  of  a subspace  we 
will  have  the  result. 

Let  g be  perpendicular  to  L*  . Now,  L = £°  + L* 

where  £°  is  a fixed  but  arbitrary  element  of  L ; thus, 

T T t 

g 1 = g 1°  is  constant  for  all  £ in  L . g £ is 

T 

constant  for  all  £ in  L implies  g y is  constant  for 
all  y in  S ; thus, 


Var(gTYJ  = gT $ g = 0 


and,  by  Lemma  2,  g is  perpendicular  to  Col($). 

T 

Let  g be  perpendicular  to  Col($)  then  g $ g = 
T T 

Var(g  Y)  = 0 . Var(g  Y)  = 0 implies  there  exists  a con- 

T 

stant  c such  that  g Y = c with  probability  one.  By 

definition,  S is  the  smallest  closed  set  which  contains 

T 

Y with  probability  one;  thus,  since  {g  y=c}  is  closed, 
T 

S £ {g  y = c}  . Let  S*  = { y1  - y2  | y^ ,y2  e S } , then 

T * 

g s - 0 for  all  s in  S*  . Now,  L*  is  the  set  of  all 

T it 

linear  combinations  of  elements  of  S*  ; thus,  g £ =0 

for  all  £*  in  L*  . 

A result  like  Lemma  3 is  stated  in  Jennrich  and 


Moore  (1975) . 
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Co roll  ary  1 . Rank  ( 1 ) - p - Rank  ( (b^  , . . . ,L^  ) J ) . 

Coroll  ary  2.  PankCj.)  <p  if  and  only  if  there  exists 

T 

a vector  b such  that  by  is  almost  surely  constant. 

Proof . Rank(|.)  <p  if  and  only  if 
Rank ( (b^ , . . . , b^ ) 1 ) ^ 1 . 

Corollary  3.  Given  a density  of  form  (1)  with 
covariance  matrix  £ (u) , let  A be  a matrix  whose  columns 
form  an  orthonormal  basis  for  the  column  space  of  J (a) . 

A can  be  chosen  to  be  independent  of  the  parameters  a . 
Hence  the  rank  and  the  singularity  of  |(a)  does  not  vary 
with  a . 

Proof . Apply  Lemma  3 then  Lemma  1. 

We  now  consider  the  convexity  of  c ( • ) on  A . 

Results  on  this  topic  may  also  be  found  in  Berk  (1972). 

Theorem  1 . ^(*)  is  convex  on  A ; furthermore, 

<M  • ) is  strictly  convex  on  A if  and  only  if  the  covar- 

T 

iance  matrix,  J(a),  of  (Yj,...,Y  ) is  full  rank. 

Proof . Let  a, a*  be  in  A then  for  0 < A < 1 
exp  M A a + ( 1 - A ) a * ) = /lexp(aTy) ) * (expfa*1/)  ] 1 *dij(y) 
s [/exp(eTy)du  (y)  ) * (/exp  (a*Ty)  d|<  (y)  ) 1 * 


t 
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with  equality  holding  if  and  only  if  (a-a*)  Y is  almost 

surely  (p)  constant  (see  Royden  (1968),  page  113).  Thus, 

f rom  ( 2 ) , 4 ( • ) is  onvex  on  A . 

Now,  if  4- ( • ) is  not  strictly  convex  then  there 

T 

exists  b = a-a*  with  a, a*  in  A such  that  by  is 

almost  surely  (p)  constant.  Conversely,  suppose  there 

T 

exists  a nonzero  vector  b such  that  by  is  almost  surely 

(p)  constant.  Pick  a*  in  A and  let  a = a*  + b . From 

T 

(2),  4>(a)  < 00  and  a is  in  A . Now,  (a-a*)  Y is 

almost  surely  (p)  constant,  so  <H  • ) is  not  strictly 

convex.  Thus  <£  ( • ) is  not  strictly  convex  if  and  only 

T 

if  there  exists  a nonzero  vector  b such  that  b Y is 
almost  surely  (p)  constant.  The  Theorem  follows  from 
Corollary  2. 


Corollary  4.  Let  £(•)  be  the  log  likelihood  of 
a sample  of  size  n taken  from  (1)  then  M • ) is  concave 
on  A ; furthermore,  2. ( • ) is  strictly  concave  on  A if 
and  only  if  $ (a)  is  full  rank. 


then 


Proof . 


Let 


1 


y 


n 

y 


be  a random  sample  from  (1), 


l (a) 


T i 

T.  ay  - n <J>(a)  > 
i=l 


thus 


MAa  + ( 1-A  ) a* ) - ( A f.  ( a ) + (H)l(a*|) 


= -n[<MAa  + (1-A)  a*)  - (A<J>(a)  + (1-A ) $ (a*)  ) ) . 
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Corollary  5.  Let  a maximize  l ( • ) on  A then, 
assuming  $(a)  is  full  rank,  3 is  unique. 

Proof . A is  a convex  set;  thus,  if  there  exists 
^l'^2  w^’cVl'  maximize  £ ( • ) then  aa^  + (l-a)a2  is  in  A 
(0  < a < 1) , and  from  Corollary  4 

Ma^  + (l-a)&2)  > a«.(a1)  + (l-a)£(&2)  = » 

which  is  a contradiction. 

Lemma  4 (Berk  (1972)).  $ ( • ) is  strictly  convex  on 

A if  and  only  if  p(*;a^)  = p(*;a2)  implies  = a2  for 
any  ct^»a2  in  A . 

Clearly  $(a)  need  not  be  of  full  rank;  however,  in 
the  following  we  will  show  that  if  $(ct)  has  rank  r <p 
then  we  may,  by  suitable  transformation,  obtain  a family 
of  form  (1)  with  r parameters  and  covariance  matrix  of 
full  rank.  This  fact  was  mentioned  in  Berk  (1972). 

Theorem  2.  Given  a density,  p(y;a)  , of  form  (1) 

with  covariance  matrix  of  less  than  full  rank,  and  the 

matrix  A of  Corollary  3,  the  transformed  variables 
T 

Z = A Y again  have  a density  of  form  (1)  but  with  natural 
parameter  B = A*n  and  covariance  matrix  of  full  rank. 

P(y;a)  = fz(ATy;ATa)  . 


Further 
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Proof . Suppose  $(u)  has  rank  r^p.  From  Lemmas 


1 and  3 


b . Y = c . , i = 1,  . 

l l 


,p-r 


almost  surely  (ij) 


Let  13  = (b^^,  . . - , b r ) T and  C = (<^,...,0  r)T 


We  may  write 


aTY  = uT(A:B) (A:B) TY 


T T T T 
= a AA  Y + a BB  Y 


T T T 
= a AA  Y + a BC  , 


almost  surely  y . Now,  from  (2) 


(a)  = aTBC  + <p  (AATa) 


Substituting  in  (1)  we  get 


T T T 

p(y;a)  = exp(a  AA  y - <p  (AA  a))  . 


Let  Z = Ay  and  ^i*  be  the  measure  defined  by 


U*  (B)  = \i  ({y|A  y c B } ) 


for  all  r-dimensional  Borel  sets  B . Now,  by  the  change 
of  variables  theorem,  see  for  example  Lehmann  (1959) , 


L 
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/ exp(uTAATy  - $>  (AATu)  )dp  (y) 

{y  | A y e B) 

= / exp(uTAz  - (AA^a)  ) dp  * (z ) 

B 


= / exp(3Tz  - <t>*  (3)  )dp*  (z) 
B 


where  3 = A1  a and  $*(3)  = <t>  (A3)  Hence 
T 

exp(3  z - ^*(3))  is  the  density  of  Z with  respect  to 
p * . This  density  is  a member  of  the  exponential  family 
with  natural  parameter  3 . 

The  covariance  matrix  of  Z is  A $A  which  is  of 
full  rank. 

Next,  let 


(3) 


g ( x ; 0 ) 


m 

exp  [ T.  n. 
k=l  k 


(0)*k(x)  - Q(0)  ] 


be  a density,  with  respect  to  a o-finite  measure  p , 
where  0 e 0 £ EP  . Such  densities  are  said  to  be  members 
of  the  exponential  family.  The  following  Theorem,  a 
statement  of  which  may  be  found  in  Berk  (1972),  relates 
the  exponential  family  and  the  exponential  family  with 
natural  parameterization. 


Theorem  3 . Y = ip  (X)  has  density 


m 

fY(y;n(0))  = exp[  z n . (9)y . -<Hn(0)))  , 

k=l  k k 
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a member  of  the  exponential  family  with  natural  parameter 
n (0 ) . Further , 

g(x;Q)  = fy(ij/(x);n(6))  • 

Proof . Consider  the  transformation  Y^  = <Py  (X)  ; 

k = l,...,m  and  let  £(A)  = y{x|tj)(x)  t A)  . By  the  change 
of  variables  theorem. 


m 

/ , exp  ( E t\  (O)iJ;  (x)  - Q ( 6 ) ) dp  (x ) 
<j>~  (A)  k=l  k 


m 

= / exp  ( T.  nk(0)y.  - Q(0))dF(y)  ; 

A k=l 


and 

m 

exp(  E r\  (0)y,  - Q (G ) ) 

k=l  K K 


is  the  density  of  with  respect  to  the  measure  f, . 

1 m 

Now 


m 

/ exp  ( E re  (0)y.  - Q ( 0 ) )d£  (y)  = 1 

k=l  k 


implies  Q(G)  = 4>  ( n ( 0 ) ) where  4>  ( - ) is  the  log  moment 


generating  function  of  £ . Substituting  in  the  density 


of  Y , and  then  in  (3),  we  obtain  the  heorem. 


Corollary  Let  A be  a matrix  whose  columns 

form  an  orthonormal  basis  for  the  column  space  of  the 

T 

covariance  matrix  of  i p(X)  . Z = A iMX)  has  density 
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T 

f (z;3)  = exp(g  z — •J'  ( B ) ) , a member  of  the  exponential 

i* 

T 

family  with  natural  parameter  £ = A n(9)  • The  covariance 
matrix  of  Z has  full  rank.  Further 

g(x;0)  = f z (ATi^  (x)  ; ATn  (0  ) ) . 

Proof . Apply  Theorem  3,  then  Theorem  2. 

In  summary,  applying  Theorem  3 and  then  Corollary  4, 
the  log  likelihood  for  a random  sample  taken  from  a family 
of  form  (3)  may  be  written  as  Mn(9))  where  K.  ( * ) is 
concave.  Also,  from  Corollary  4,  Z ( • ) is  strictly 
concave  if  and  only  if  the  covariance  matrix  of  Y = ij/(X) 
is  full  rank.  A density  of  form  (3)  will  be  said  to  be  in 
canonical  form  if  the  covariance  matrix  of  iMX)  is  of 
full  rank. 

If  the  density  of  X is  not  in  canonical  form  then 

from  Corollary  6 we  may  write  the  log  likelihood  of  the 
T 

sample  as  ?.  (A  n(0))  where  SL  ( • ) is  strictly  concave. 
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3.  UNIQUENESS  OF  MAXIMUM  LIKELIHOOD  IN  THE  EXPONENTIAL 
FAMILY 

Let  us  now  consider  the  maximum  likelihood  problem 
for  families  of  the  form  (3).  Suppose  that  0 is  a 
maximum  likelihood  estimate  for  6.  When  will  0 be 
unique?  Huzurbazar  (1949)  demonstrates  the  uniqueness  of 

A 

6 in  families  of  the  form  (3)  where  m = p . However, 
from  the  following  example,  we  see  that  m = p is  not 

A 

necessary  for  the  uniqueness  of  6 . 

Example  1 (Charnes,  et.  al.  (1975)).  We  wish  to 
model  the  effect  of  radiation  on  bacteria  in  suspension. 

For  each  radiation  does  level  several  dilutions  will  be 
placed  on  petri  dishes  and  the  number  of  resulting  colonies 
counted. 

Let 

X^  = concentration  of  bacteria  in  suspension, 

X. ~  = radiation  dose, 

1 2 

n.  = number  of  dilutions  observed  at  the  i^*1 
dose  level 

Y.  . = number  of  colonies  counted  for  the 
1 dilution. 

We  assume  that  Y „ ( j * 1 . . . n^,  i*l  ...N)  are  independent 
Poisson  distributed  random  variables,  Y^  having  expected 
value 


Vn  e*e<-62  xi2>  • 
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The  parameter  8^  represents  the  number  of  colonies 
forming,  per  unit  volume  of  suspension,  when  no  radiation 
is  present;  0^  describes  the  radiation  sensitivity  of 
the  bacteria. 

We  will  estimate  the  parameters  by  maximum  likelihood. 
The  likelihood  for  a sample  ^ ( j = 1 . . . n^ , i = 1 . . . N is 
proportional  to 


N 

l 


n . 

r1 


i=l  j=l 


[Yij(ln(G1X.1)  - 


°2Xi2) 


°lXil  exP (_62Xi2)  1 ' 


or,  employing  the  dot  notation  of  the  analysis  of  variance, 

(4)  I [y  (1„(9  X ) -6  X > - n 9 X e*p<-9  X >]  . 

1=1 

Now,  this  likelihood  is  one  from  a family  of  the 
form  (3)  where  m = N and  p = 2 . If  (4)  has  a maximum 
is  it  unique?  The  likelihood  equations  for  (4)  are 


N 


(5) 

and 


,-l 


- nixu  «‘p<-Vi2>1  - o . 


(6)  I-Xi.Ki2  * VlXil*i2  e*p(-92Xi2)l  - 0 • 
1=1 

From  ( 5 ) 


N N 

0l  " i=l  yi*  /i=i  niXil  exP(-®2Xi2)  ; 


(7) 
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substituting  into  (6)  we  have 


(8) 


JaniXii  exP(“82Xi2) 
_ _ 

Z n.X  X._  exp(-0_X._) 

l ll  i2  r 2 i2 


i=  1 


y<  ■ 


N 

.£/i.Xi2 

1=1 


Let 


then 


9l°2)  = *5 


N 

i£1niXil  exp(-02X.2) 


JiniXilXi2  exP(-02X.2] 


g'(Q2)  - 


'(.J:iniXilXi2eXP("62Xi2))2+.I:iniXiieXP<"e2Xi2).I:  niXilXi2eXp('Vi2> 
1=1  1=1  1=1 

i=l 


Letting  a.  = /n  . X . , exp  ( — 0 _X  . _ ) and  b.  = X._  a.  then 

i i i j.  2.  i2  i 12  i 

by  Schwarz  inequality 


N 


N 2 “ .2 


( Z a . b . ) s Z a“  £ bT 


i=l 


l l 


i=l  1 i=l  1 


with  equality  holding  only  if  — = X._  is  constant  for 

a^  i2 

all  i ; thus,  assuming  X^2  is  not  constant  in  i , 
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g'  ( 0 2 ) > 0 . Therefore,  if  a solution  to  the  likelihood 
equations  exists,  it  is  unique. 

Given  a solution  to  the  likelihood  equations  we  will 
show  that  it  is  a local,  and  thus  unique  global,  maximum  by 
showing  that  the  Hessian  of  the  likelihood  is  negative 
definite  when  evaluated  at  the  solution.  A matrix  is 
positive  definite  if  and  only  if  its  principle  minors  are 
positive  (see  Nobel  (1969),  page  395).  The  principle  minors 
of  the  negative  of  the  Hessian  of  the  likelihood  evaluated 
at  the  solution  of  the  likelihood  equations  are 


and 


N 

l 

i=l 


N 

£ 

i=  1 


n.X.  . 
l 1 1 


exp(-02Xi2) 


- ( 


N 

I 

i=l 


niXilXi2  exp(- 


02Xi2>> 


The  first  principle  minor  is  greater  than  zero  and  by 
replacing  0^  by  the  expression  given  by  (7)  the  second 
is  equal  to 


N 

£ 

i= 1 


"iXil 


exp(-G2Xi2 


N 

) Z 
i=l 


n.X..X 
l ll 


2 

i2 


exp (~®2Xi2) 


N 

~ ( z 

i=l 


n.X. .X. . 
l ll  i2 


exp(-02Xi2) ) 


2 
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which  is  also  greater  than  zero;  thus  the  Hessian  of  the 
likelihood  evaluated  at  the  solution  of  the  likelihood 
equations  is  negative  definite. 

In  summary  we  have  shown  that  given  a solution  to 
the  likelihood  equations  (5)  and  (6)  this  solution  is  a 
unique  global  maximum  of  the  likelihood  (4). 

To  demonstrate  the  existence  of  solutions  in  a 
numerical  example  we  consider  the  following  data: 


i 

Xil 

Xi2 

n . 

l 

y 

ij 

1 

1 

0 

6 

299 

283 

280 

246 

264 

2 

1 

1 

2 

169 

184 

3 

2 

2 

5 

179 

224 

188 

202 

194 

4 

4 

3 

5 

233 

261 

229 

286 

264 

5 

10 

4 

4 

401 

410 

356 

388 

6 

4 

5 

5 

157 

146 

134 

161 

159 

Using  a search  technique  to  solve  (8)  we  obtain  = *4459 
and,  by  evaluating  (7)  at  0^  • we  have  0^  = 256.9. 

Now,  returning  to  the  general  discussion,  suppose 

A 

0 is  a maximum  likelihood  estimate.  The  likelihood  for 
a sample  of  size  n taken  from  a population  of  form  (3)  is 
£ (p (6 ) ) » where 


£(•) 


In  fy(^  ( xi ) ; • ) . 


From  this  we  observe: 


n 

L 

i=l 
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Theorem  4 . 0 is  unique  if  and  only  if 

(i)  n ( 0 ) is  t he  unique  maximum  of  *.  ( • ) on  the 
range  of  n ( * ) , and 

(ii)  there  exists  no  other  0 such  that  n(0)  = n(0) 

The  following  example  illustrates  the  use  of  Theorem 
4 in  the  logistic  model  of  Thompson  (1976). 


Example  2 (Logistic  Life  Study  Model) . The  log 
likelihood  of  the  logistic  life  study  model  is 


L(B,n)  = 


l Z [y  . (z.  .3  + n •)  ~ ln(l  + exp(z.  .3  + n • ) ) 1 
j=l  V.uS.  13  13  3 3 3 


J D 


where  zV  is  a vector  of  variables  for  the  ifcb  individual 
in  the  jth  time  interval,  is  the  set  of  survivors  of 

the  jth  time  interval,  is  the  set  of  failures  in  the 

jbb  time  interval,  y^  equals  1 if  the  i^b  individual  is 
in  Vj  and  0 if  the  i^h  individual  is  in  , and 

(3  iHj# • • • #nm)  is  a vector  of  unknown  parameters  to  be 
estimated . 

Now,  L(3,n)  is  the  log  likelihood  of  a density 
which  is  a member  of  the  exponential  family  and  in  this 
case  the  function  n(*)  is  the  linear  function  determined 
by  zij8  + Hj  , i t V.uS.  , j = l,...,m  and 

m 

Ma)  =1  Z [y  .a  - in(l  + exp(a.  . ) ) ] . 

j = l V.uS.  lj  13  13 

1 1 
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For  the  logistic  model,  hypothesis  (i)  of  Theorem  4 

holds  since  we  are  maximizing  a strictly  concave  function 

over  a linear  manifold;  and,  hypothesis  (li)  becomes 

necessary  and  sufficient  for  a maximum  likelihood  estimate, 
^ T ^ /\ 

(3  . . . n ) i to  be  unigue. 

1 m 

A A A 

Thus,  (3  :rij  ...  nm)  will  be  a unique  maximum  like- 
lihood estimate  if  and  only  if  the  matrix 


is  full  rank. 

Thouqh  Theorem  4 was  stated  with  the  exponential 
family  in  mind,  we  may  apply  it  to  any  problem  where  the 
density  is  a composite  function.  In  the  following  example, 
we  consider  a modified  logistic  model,  applying  Theorem  4 
to  obtain  conditions  for  the  uniqueness  of  maximum 
likelihood  estimates; 
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Example  3 (Modified  Logistic  Life  Study).  In 
Thompson  (1976)  items  censored  in  an  interval  were  considered 
to  be  not  at  risk  in  the  interval;  thus,  no  contribution 
to  the  likelihood  was  obtained  from  the  interval  in  which 
an  individual  was  censored.  Thompson  (1977)  considers  a 
modification  of  the  logistic  model  to  obtain  information 
from  the  interval  in  which  censoring  occurred. 

The  log  likelihood  of  the  modified  logistic  model  is 


,,n)  = Z ( Z y..(z.  .B+n,)  - Z fcn(l  + exp(z.  -B+n  •! 
j=l  V ■ uS . 13  13  3 V.uS.  13  3 


-2  Z £n(l + exp(z  .B+n • ) ) ] , 

L.  33 

D 


where  L^.  is  the  set  of  individuals  censored  in  the  j 
interval.  Now, 


L ( B , n ) = Mz^B+hj)  ; ieV^uS^uL^.  , j=l,...,n) 


where 


Ma..)  = l [ Z y..a..-  Z £n  ( 1 + expa . . ) 

13  j=l  V.uS.  13  13  V.uS.  3 

J 3 3 J 3 


-2  Z In ( 1 + expa . . ) ) 
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thus,  if  £ ( • ) is  strictly  concave  then  by  applying  the 
same  reasoning  as  in  Example  2,  a maximum  likelihood 
estimate  will  be  unique  if  and  only  if  the  matrix  Z is 
full  rank.  To  show  that  £ ( • ) is  strictly  concave  note 
that 


_92£(a)_ 

3ak£3ajk 


i J k or  j / n 

i = k,  j = £ 


i = k,  j = £ 

i c L . 

3 


Thus,  the  matrix  of  second  order  partials  of  £(•)  is 
negative  definite,  and  hence  (see  Roberts  and  Varberg  (1973) 
page  103),  £(•)  is  strictly  concave. 

Conditions  (i)  and  (ii)  of  Theorem  4 are  difficult 
to  verify  when  working  with  a particular  problem;  therefore, 
it  is  desirable  to  find  conditions  sufficient  for  both  (i) 
and  (ii)  which  are  more  tractable. 

Consider  (i).  A necessary  and  sufficient  condition 
for  n (9)  to  be  unique  is  that 

n(0)  n (a  | £ ( a)  ;»  £ (n  (0 ) ) > = { n (0)  } . 
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In  the  case  that  g(x;0)  is  in  canonical  form  and 
VMn(O)  ) exists,  a sufficient  condition  for  (i)  may  be 
estab1ished  using  the  following  well  known  fact  (see,  for 
example,  Roberts  and  Varberg  (1973),  page  98): 


Lemma  5.  Let  9.  ( • ) be  strictly  concave  on  A and 
differentiable  at  Qj  then 


9 (a2)  ? £ ( ot j ) (a ^ ? a^) 


imp! ies 


(a2  - aj)  ‘VMc^)  > 0 


Theorem  5.  If  ( n ( 0 ) - n ( 6 ) ) TVX.  ( n (6 ) ) < 0 for  all 

A 

G then  n(0)  is  unique. 


Proof . By  assumption 


n(0)  c {o|  (a  - n(0) ) VMn(O) ) s 0} 


and  from  Lemma  5 


{ a | 9 ( a ) £ l ( n ( 0 ) ) and  u ^ n ( 6 ) } £ 
{**  | (a  - n (0)  )TV?„  ( n (0)  ) > 0}  ; 


Thus,  n (0)  n { a | £ ( a ) ;>  *.  ( n ( 6 ) ) } = { n < 6 ) } 


We  will  illustrate  Theorem  5 using  example  1. 

The  covariance  matrix  of  Y,  is  a matrix  with 

J • N • 

diagonal  terms  °lXil  expf-Q^X^)  and  off  diagonal  terms 


24 


zero;  thus,  provided  X is  not  zero  for  any  i , 
density  of  Y^,...,Y  is  in  canonical  form.  Now 


n(0)  = (fcntGjX^)  - 9 2Xi  2 ' * 


..*n(0iXK1)  -02XN2l 


and 


N 

M a)  = T.  (y.  a.  - n.  exp  a . ) ; 
i=l  x*  1 1 1 


thus, 


^ T ® I ^ 0-1 

(n(O)-n(O))  = (ln(*r±)  + (0  -0  )X._,.  . .,£n(,r±) 

01  2 2 12  Q 


and 


VMn(6))  = (yj.  -n101X11  exp  (-02X12) yN. 


- nNeiXNlexP  (-§2XN2)) 


Therefore 


(0(0)  - o(0) ) V£(n(0) ) = 


0,  N 


*nVi=l(yi-  ""l9lXilex«,<-82Xi2>>  * 
(02  - 02>  t X12 (yi.  - ni61xil  exp  (-e2x.2) ) 


the 


Now,  from  (5) 
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N 


J^i.  -n^x^exp  (-e2xi2)) 


and  from  (7)  and  (8) 

N 

if1  Xi2 (yi . _ ^lniXii  exp  (~02Xi2) > = 0 ' 

thus,  from  Theorem  5 n(0)  is  unique. 

An  assumption  stronger  than  (ii)  is  that  n(')  he 
one  to  one,  this  is  the  case  in  Example  1. 

In  fact,  n (0*)  = n (0)  implies 

0jxn  exp  (-0*X.2)  = O^^X.  x exp  {-02X.  2)  , i = l,...,N 

or 

0*/Ox  exp  ( (02  - 0*)Xi2)  = 1 , i = 1, . . . , N . 

Thus,  assuming  X^2  is  not  constant  in  i , = 0^  and 


In  summary,  through  application  of  Theorem  5 and  by 
showing  that  n(*)  is  one  to  one,  we  have  shown  that 
hypothesis  (i)  and  (ii)  of  Theorem  4 hold;  thus,  the 
maximum  likelihood  estimate  for  the  parameters  in  Example  1 
is  unique. 
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4 . ESTIMABLE  FUNCTIONS  AND  TESTABLE  HYPOTHESES  FOR  THE 
NORMAL  LINEAR  MODEL 

From  Example  2 we  see  that  the  likelihood  in  Thompson 
(1976)  will  admit  a unique  maximum  likelihood  estimate  if 
and  only  if  the  linear  transformation  determined  by  the 
matrix  of  covariates  is  one  to  one.  Therefore,  to  look  at 
inference  problems  for  the  logistic  model  when  the  maximum 
likelihood  estimates  are  not  unique,  we  will  consider 
members  of  the  exponential  family  in  which  n(*)  is  not  a 
one  to  one  function.  In  this  case,  problems  of  identifica- 
tion, as  discussed  in  Koopmans  and  Reiers^l  (1950),  arise. 

Before  we  discuss  the  identification  problem  in 
general,  let  us  consider  another  example  of  a member  of  the 
exponential  family  where  n ( • ) is  not  one  to  one  --  the 
normal  linear  model  of  less  than  full  rank. 

In  the  normal  linear  model  we  have  an  n * 1 
dimensional  random  vector,  Y , which  we  express  as 

Y = XB  + £ , 

where  X is  an  n x p matrix  of  known  values,  8 is  a 

p * 1 vector  of  unknown  parameters,  and  e is  an  n *‘l 

vector  of  errors  distributed  as  a multivariate  normal  with 

2 

mean  0 and  variance  o i . The  log  likelihood  for  y is 
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or 


T T 

_ x_y_  - 

2 _ 2 

o 2o 


T T 
B X XB 

o 2 
2o 


n ?.n  a 


thus,  lottinq 


and 


2 T T 2 IT 

n ) = (B  x /c>  , 


2o 


<My)  = (yT:yTy) 


0 (B#  o ) - 


T T 

_n  2 
^ ^ Jen  o / 

2cj  ^ 


we  may  write  the  likelihood  for  y as 

n(B,o2)THy)  ‘ Q (0 ,o2)  • 

2 2 2 2 
Now,  n(B*»oA)  = n(3,o  ) if  and  only  if  a*  = o 

and  X({J*  - B)  = 0 ; thus,  n ( • , • ) is  one  to  one  if  and 

only  if  X is  full  rank.  Therefore,  the  normal  linear 

model  of  less  than  full  rank  is  a member  of  the  exponential 

family  for  which  n(‘)  is  not  one  to  one. 

In  the  normal  linear  model  the  concepts  of  estimable 

functions  and  testable  hypotheses  are  introduced  to  remedy 

problems  caused  by  X being  less  than  full  rank. 


We  will  denote  the  null  space  of  the  matrix  M (the 
set  of  solutions  to  Mx  = 0)  by  Null(M)  and  the  row  space 


28 


of  M (the  set  of  all  linear  combinations  of  the  rows  of 
M)  by  Row(M)  . 

An  estimable  function  is  defined  as  follows: 

A linear  function  of  6 is  estimable  if  and  only  if  there 
is  a linear  function  of  Y which  is  an  unbiased  estimate 
of  it.  From  this  definition  we  have  the  following: 

T 

Theorem  6 . A 6 is  estimable  if  and  only  if 
T 

A = X r for  some  r . 

T 

Proof . If  A 6 is  estimable  then  there  exists 
a vector  r such  that  E(rTY)  = rTXB  = ATB  for  all  ; 
thus,  A = XTr  . 

Conversely,  A = XTr  implies  E(rTY)  = ATB  . 

T 

We  may  restate  Theorem  6 as  A 8 is  estimable  if 
T 

and  only  if  A is  in  Row(X)  . 

Now  B is  a maximum  likelihood  estimate  for  8 if 

A 

and  only  if  3 solves  the  normal  equations, 

T T 

X XM  X Y , 

One  important  property  of  estimable  functions  is 
given  by  the  following  result: 

■jt  *j>~ 

Theorem  7.  A 8 is  estimable  if  and  only  if  A 6 
is  constant  for  all  £ maximizing  the  likelihood. 


Proof . Rao  (1965,  page  181). 
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Another  important  property  of  estimable  functions 

concerns  tests  of  hypotheses.  A hypothesis,  H,  stating 

that  B is  in  S = (B|xTb=itk,  i=l, ...,£}  is  called 

T 

testable  if  and  only  if  X^B  is  estimable  for  each  i . 
Without  loss  of  generality  we  will  assume  that  the  X^'s 
are  linearly  independent. 

Searle  (1971)  discusses  testable  hypotheses,  showing 

that  the  sum  of  squares  error  under  a nontestable  hypothesis, 
T 

where  all  X^B  are  not  estimable,  is  equal  to  the  sum  of 
squares  due  to  error.  Let  ~H  be  the  hypothesis  stating 

Q 

that  B is  in  S . Seely  (1977)  shows  the  intersection  of 
the  sets  of  expected  values  under  the  null  H and  the 
alternative  ~H  is  empty  if  and  only  if  H is  testable. 

The  following  is  a version  of  Seely's  result. 

Theorem  8 . 

XS  n XSC  = <t> 

T 

if  and  only  if  X^B  is  estimable  for  each  i . 

T 

Proof . Suppose  for  some  i , X^B  is  not  estimable 
then,  from  Theorem  6,  X^  is  not  in  Row(X)  ; thus,  there 
exists  a B*  in  Null(X)  for  which  X?B*  / 0 . Let  B^ 
be  in  S then  B^  + B*  is  in  Sc  ; however, 

X(BX  + B*)  = XBX + XB*  = XB1  . 
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Therefore , 


XS  n XSC  i 4,  . 

Conversely,  if  XS  n XS  / 4>  then  there  exists  3^  in  S 
and  ^2  sC  such  that  XB^  = XB2  «'  thus,  32  - B^  is 

in  Null (X)  . 

T T 

Now,  for  at  least  one  i , A^B-^  ^ ' t^lus' 

T 

^i^2~^l^  ^ ® * Therefore,  A ^ is  not  in  Row(X)  and, 

T 

from  Theorem  6,  A^B  is  not  estimable. 

The  next  theorem  gives  a more  exact  relationship 
between  XS  and  XS°  . 

T 

Theorem  9 . If  A^B  is  not  estimable  for  at  least 

one  i then  XS  c XSC  = Col(X)  . Furthermore,  if  A^B 

. . c 

is  not  estimable  for  any  i then  XS  = XS 

T 

Proof . Suppose  for  some  i that  A^B  is  not 
estimable,  then  from  Theorem  6,  A^  is  not  in  Row(X)  ; 
thus,  there  is  a 6 in  Null(X)  such  that  A^6  ? 0 . 

Let  B be  in  S then 

AT(B+6)  = ATB  + AT6  = m.  + AT5  ^ m.  ; 

1 1 1 1 1 T 1 

Q 

thus,  3+6  is  in  S . Now, 

X ( 3+6 ) = X3  + X6  = X3  ; 

hence,  X3  is  in  XSC  . Therefore,  XS  c XSC  . 
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Mow,  Col (X)  = XS  u XSC  = XSC  . 

T 

Suppose  A^g  is  not  estimable  for  any  i , then, 

T 

from  Theorem  6,  for  each  i , A^  is  linearly  independent 

of  the  rows  of  X . Now,  since  the  vectors  A.  , 

1 

i=l,...,&  , are  linearly  independent,  the  equations  in 
the  variable  n , 

T T 

A^n  = itk-A^B  , i = 1,  . . . , SL 

and 


Xn  = 0 

have  at  least  one  solution  for  all  g . Hence,  for  g* 
in  S there  exists  n*  such  that  g*  + n*  is  in  S and 

X(g*  + n*)  = Xg*  . 

Therefore,  XSC  c XS  and  the  theorem  follows. 

We  may  extend  the  results  of  Theorem  8 to  hypotheses 
involving  inequality  constraints. 

Theorem  10.  Let 

T T 

S = {8|A^g=nr,  i=l,...,s  and  Afg^rrr,  i=s+l,...,e) 

C T 

then  XS  n XS  = 4*  if  and  only  if  A^g  is  estimable  for 
each  i . 
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T 

Proof . If  A^B  is  not  estimable  for  some  i , 
then,  as  in  the  proof  of  Theorem  9,  there  exists  a 6 in 
Null  (X)  such  that  A?6  f . Let  6*  be  in  S then  we 

may  find  a real  number  r such  that  B*  + r6  is  in  SC  . 
Now, 

X ( 3*  + r 6 ) = XB*+rX6  = XB*  ; 

Q 

thus,  XS  n XS  / <J)  . The  proof  of  the  converse  is  the 
same  as  that  of  the  converse  of  Theorem  8. 
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5.  IDENTIFIABLE  PARAMETRIC  FUNCTIONS 

5 . I Definitions  and  Properties 

In  the  discussion  of  the  normal  linear  model  the 
concept  of  an  estimable  function  was  used  to  solve  some 
of  the  problems  associated  with  X being  less  than  full 
rank.  To  generalize  this  concept  to  functions  of  the 
parameter  of  an  exponential  family  member  where  n ( * ) is 
not  one  to  one,  we  considered  the  concept  of  identifi- 
ability.  (See  Theorem  17.) 

Let  Y be  a sample  with  density  f(y;0).  From 

Koopmans  and  Reiers^l  (1950)  a function  h(-)  of  0 

will  be  called  identifiable  at  if  f ( * ; 0 ) = f(*;0„) 

' — u u 

implies  h(Q)  = hfO^)  . The  significance  of  identif iability 
is  as  follows:  Suppose  that  an  observation  Y is  pro- 
duced according  to  some  member  of  the  class  of  densities 
f ( • ; 0 ) , 0 cQ  . From  Y we  wish  to  make  an  inference 
about  the  true  0 , say  0^  . The  characteristic  of  6 , 
in  which  we  are  interested,  is  h ( 0 ) If  h(*)  is  not 

identifiable  at  0^  then  there  exists  0'  such  that 
f ( • ; 0 ' ) = f ( * ; 0 Q ) but  h(G')  ^ h(QQ)  ’ T'"ms»  even  if 
we  could  infer  the  density  perfectly,  we  could  still  not 
discriminate  between  h(O')  and  hfe^)  * 

Theorem  1 1 . If  f (’;0q)  = f(*J0^)  then  h is 
identifiable  at  0Q  if  and  only  if  it  is  identifiable 
at  • 
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Proof . Suppose  h(*)  is  identifiable  at  0^  . 

By  definition,  MO^)  = h(0Q)  , and  f ( • ; 6 ) = f(‘;0Q) 
implies  h(0)  = h(00)  • Hence  f ( • ; 0 ) = f ( * ; 0 ^ ) implies 
h(0)  = h(0})  . 

Theorem  12.  In  the  special  case  that 
Y = (X^#...,X  ) is  a random  sample  from  a density 
fY(x;0)  then  h(*)  is  identifiable  at  0 if  f ( • ; 0 ) = 
implies  h ( 0 ) = h ( Q Q ) , 

n 

Proof . Since  f(x;G)  = n f (x.;G)  , then 

i=l  1 

f ( * ; G ) = f(-;00)  is  equivalent  to  f x ( ' » 6 ) = fx^‘;6o^  * 

From  Theorem  12,  if  we  are  observing  a random 
sample  from  some  density  then  the  set  of  functions 
identifiable  at  6^  is  the  same  for  all  sample  sizes  and 
we  may  check  h(*)  for  identif iability  at  0^  , for  any 
particular  sample  size,  by  checking  at  sample  size  one. 

/v 

Let  0 be  a maximum  likelihood  estimate  for  0 
then,  from  Zehna  (1966),  h(G)  is  a maximum  likelihood 
estimate  for  h ( 0 ) . 

A 

Theorem  1 3 . h(G)  is  a unique  maximum  likelihood 
estimate  for  h(0)  only  if  h(*)  is  identifiable  at  0 . 

A 

Proof . If  h(G)  is  unique  then  h(*)  is  constant 
on  { 0 | f ( y ; 0 ) = f(y;0)}  . Now,  (0|f(*;0)  = f ( • ; 0 ) } 
c (6|f(y;6)  = f(y;0)}  ; thus,  h(*)  is  constant  on 
(0|f(-;G)  = f ( • ; 6 ) } , so  h(*)  is  identifiable  at  0. 
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Now  we  return  to  g(x;0)  , a density  of  form  (3). 

Theorem  14 . Assuming  n(6)  is  unique,  h(0)  is  a 
unique  maximum  likelihood  estimate  for  h{0)  if  and  only 
if  h(*)  rs  identifiable  at  0 . 

Proof . From  Theorem  3, 

n n 

f (x ; 0 ) = n g ( x . ; O ) = n fv  [«Mx  . ) ; n (0)  ] 
i=l  1 i=l  Y 1 

so  that  n(0)  = n (0)  implies  f ( - ; 0 ) = f ( • ; 0 ) . Therefore, 

. A ^ 

if  h is  identifiable  at  0 , then  n ( 0 ) = n(0)  implies 
h(0)  = h(0)  . The  converse  is  given  by  Theorem  13. 


5 . 2 Uniformly  Identifiable  Parametric  Functions 
Let  g(x;0)  be  a density  of  form  (3). 


Lemma  6 . Assuming  g(x;6)  is  in  canonical  form, 
then  g(*;0)  = g(';0g)  if  and  only  if  n(0)  = n(0Q)  • 

Proof  . From  Theorem  3,  g(x;0)  = f ( (x)  ; n ( 0 ) ) . 

Since  g(x;0)  is  in  canonical  form,  from  Theorem  1,  the 
function  $ ( • ) for  fy(y;a)  is  strictly  convex  on 
{ a | ( ct ) < «■}  . Thus,  from  Lemma  4,  g(*;0)  = g(*;0Q)  if 

and  only  if  n ( 0 ) = n(0Q)  • 


Theorem  15.  Assuming  g(x;0) 
form,  then  h(*)  is  identifiable  at 


n (0)  = 


= h(00) 


to  be  in  canonical 
6q  if  and  only  if 


i 


n(0Q)  implies  h(o) 
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Proof . The  result  follows  from  Lemma  6 and  the 
definition  of  identifiable  at  . 

The  rest  of  this  section  depends  on  Theorem  15,  so 
we  will  restrict  our  attention  to  densities  of  form  (3) 
in  canonical  form. 

Corollary  7.  Let  n(G^)  = h\Gg)  then  h ( - ) is 
identifiable  at  Og  if  and  only  if  h(*)  is  identifiable 
at  0 ^ . 

This  follows  from  Theorems  11  and  15. 

Koopmans  and  Reiers^l  (1950)  call  h(*)  uniformly 
identifiable  if  h(*)  is  identifiable  at  Gg  for  all  Gg 
in  G.  If  h(-)  is  not  uniformly  identifiable  then  the  set 

0 = { G | h ( - ) is  identifiable  at  6} 

is  important. 

Theorem  16.  For  a in  n(0)  let  r(a)  = h(0)  , 
where  n(G)  = ot,  then  r(*)  is  a function  from  nfO^)  to 
h(Q^)  ; that  is,  h(G)=r(n(G))  for  G in  0^  . 

Proof.  r(*)  is  a function  from  n(0.)  to  h(0.) 
h n 

since,  if  a = n(Gg)  = 
r (a)  = h(0g)  = hfG^  . 


n(G^)  then,  from  Theorem  15 
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Corollary  8.  h(*)  is  uniformly  identifiable  if  and 

only  if  there  exists  a function  r(*)  such  that  h(G)  = 
x (n  (0) ) for  all  0 in  0. 


Proof . If  h ( * ) is  uniformly  identifiable  then 

0,  = 0 ; thus,  from  Theorem  16,  there  exists  a function 
h 

r(*)  such  that  h(G)  = r(n(8))  for  all  0 in  0 . 

Conversely,  if  h ( 0 ) = r ( n ( 0 ) ) for  all  G in  Q , 
then  from  Theorem  15  h(*)  is  uniformly  identifiable. 


Corollary  9.  In  the  case  that  n(0)  - MG  for  a 

T P 

matrix  M , h(0)  = A 0 for  some  vector  6 and  0 = E , 

h ( - ) is  identifiable  at  6Q  if  and  only  if  h(*)  is 

uniformly  identifiable. 

Proof . Suppose  h(*)  is  identifiable  at  0^  , then 

T T 

from  Theorem  15,  MG  = MG^  implies  A 0 = A 0^  . 

Let  MG  = 0 then  M(0  + 0^)  = MGQ  ; thus, 

A ( 0 + 0Q)  = A0Q  , so  A 0 = 0 . 

T T 

Thus,  A is  perpendicular  to  Null(M)  , so  A is 

in  Pow(M)  . Therefore 


T T 

AG  = r MG 


for  some  vector  r and,  from  Corollary  8,  n(0) 


T 

A 1 0 is 


uniformly  identifiable.  The  converse  is  a special  case. 
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T 

Theorem  17.  In  the  normal  linear  mcdel  A g is 
estimable  if  and  only  if  it  is  uniformly  identifiable. 

T 

Proof . Suppose  A 6 is  estimable,  then  from 
Theorem  6 , 


T T 

A 6 = r Xg 


T 

for  all  g ; thus,  from  Corollary  8,  A g is  uniformly 
identifiable . 

T 

Conversely  if  A g is  uniformly  identifiable  then, 
again  applying  Corollary  8,  there  exists  a function  r ( • ) 
such  that 


ATg  = r (Xg) 

T 

for  all  g . Now,  A g being  linear  implies  r(*)  must  be 
linear;  thus,  there  exists  r such  that 


T T 

Ag  = r XS  , 


and  from  Theorem  6 AAg  is  estimable. 

This  result  was  obtained  in  ReiersjzSl  (1963)  by  a 
different  method. 


5 . 3 Comparison  of  Uniformly  Identifiable  Functions  and 
Those  Possessing  an  Unbiased  Estimate 

From  Theorem  17  we  see  that  the  concept  of  uniform 
identif iabi lity  is  one  possible  generalization  of  the  con- 
cept of  an  estimable  function.  Another  generalization  is 


t 

1 


I 


4 
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those  functions  having  an  unbiased  estimate. 

Let  Y be  a sample  with  density  f(y;0)  . 

Theorem  18.  If  u(G)  has  an  unbiased  estimate, 
then  u(*)  is  uniformly  identifiable. 

Proof . There  exists  a function  z(*)  such  that 

u(0)  = / z (y)  f (y;  0)du  (y) 

for  all  0;  thus,  f(*;0Q>  = f(*;6^)  implies  u(0g)  = u(6j)  • 

In  the  following  example  we  look  at  a density  where 
there  is  a function  which  is  uniformly  identifiable,  but 
does  not  have  an  unbiased  estimate. 

Example  4.  Let  be  indePendent»  binary 

random  variables  such  that 

exp(01+B2+63) 

P(Y1=1)  = 1 + exp(61+02+63) 

and 

exp(B  +B  —6-. ) 

P(Y2=1)  = 1 +exp(B1+32-63)  * 

Letting  = (1,1,1),  X2  = (1,1,-1)  and  B = (61*B2#3-j)T  > 
we  may  write  the  density  of  Yi'Y2  aS 

exp  y^XjB  + y2x2B  - £n(l  + expfX^C) ) - in ( 1 + exp (X2B ) ) , 

which  is  of  form  (3),  in  canonical  form. 
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Now,  n ( • ) is  linear  with  coefficient  matrix 
T T T 

(Xj^)  • So,  from  Corollary  8,  the  function 


xje  = (^s  ,-H)  xe  = e3 


is  uniformly  identifiable. 

T 

We  will  show  that  Ag3  does  not  possess  an  unbiased 

estimate.  Assume  there  exists  a function  z(*,‘)  such 
T 

that  Agg  is  equal  to  the  expected  value  of  zfY^Y^)  for 
all  6 . 


Now, 


expfy^B) 
^Too  1 +exp(X16) 


Yl  = 1 

yl  = 0 


and 


exp(y2X23) 
1 +exp(X23) 


thus 


lim  E(z(Y  ,Y  ) ) 
83-” 


lim  Z z (y. ,y  ) 
63-” 


expfy^S)  exp(y2X23) 
l+exp(X33)  1 + exp(X23) 


= z ( 1 » 0)  , 


where  the  summation  is  over  the  sample  space.  However,  the 


expected 

value  of 

z(Y1,Y2) 

is 

83  for  all  3 , so 

2(1,0)  = 

lim  3 = 

+°°  . Now, 

if 

z(l,0)  =°°  then  the 

8 -*«■> 


then  the 
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expected  value  of  z(Y^#Y  ) is  also  00  for  all  £ , which 
is  a contradiction. 

5. 4 Examples 

Examp] e 5 (Logistic  Life  Study  Model  - Continued) . 

In  example  2 we  saw  that  a unique  maximum  likelihood  estimate 

for  (B*!^ %)T  exists  and  only  matrix  Z is 

full  rank.  Assuming  that  Z is  not  full  rank,  we  wish  to 
consider  the  class  of  uniformly  identifiable  functions. 

From  Corollary  8 a function,  h(*)»  is  uniformly 
identifiable  if  and  only  if  there  exists  a function  r(*) 
such  that 

h^T;,'l VTl  = r(Z(eT-nl nm)T) 

or,  for  differentiable  r ( * ) , 

Vh[ (BT:n1, . . .,nm)T]  = ZTVr(Z(BT:n1» . . ..nm)T)  ; 

thus,  for  linear  h(*)  , h(*)  is  uniformly  identifiable  if 

and  only  if  for  some  vector  r 

T 

A = Z r 

where  A = Vh ( ( n, » . . • » n J7]  . In  other  words  the  class 

1 m 

of  linear  uniformly  identifiable  functions  is  that  class  of 

functions  whose  gradients  are  in  Row(Z)  . 

Let  A^  , i=l,...,s  , span  the  space  orthogonal  to 

T T T 

the  row  space  of  Z . The  functions  A . ( B ) » 

1 1 m 
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i = l,...,s  are  useful  in  the  computation  of  a maximum  like- 
lihood estimate.  As  noted  in  Example  2,  there  exists  a 

T T 

unique  n in  the  range  of  Z(fl  in^,...,nm)  which  maximizes 

the  likelihood;  thus,  a maximum  likelihood  estimate  for 
T.  T 

(6  ; i]  j , . . . , n ) can  be  found  by  solving 

Z(6T:n1 ) T = n . 

1 m 

If  we  solve  these  equations  under  the  restriction 
T T T 

A i (8  : , . . . , nm)  = 0 , i = l,...,s  , then  we  have  a full 

rank  system  of  equations,  and  thus,  a unique  solution. 

The  following  are  examples  of  densities  of  form  (3) 
where  n ( * ) is  not  a one  to  one  function,  and,  like  the 
logistic  model,  satisfy  hypothesis  (i)  of  Theorem  4 for  all 
samples  sizes. 

Example  6 (Retrospective  Study),  Cox  (1970). 

We  might  like  to  estimate  the  conditional  probability  of 
getting  cancer  given  a person  smokes  minus  the  conditional 
probability  of  getting  cancer  given  a person  does  not  smoke. 
The  ideal  way  to  do  this  would  be  to  take  a sample  of  both 
smokers  and  nonsmokers,  follow  the  state  of  their  health  for 
a number  of  years,  and  then,  check  the  group  to  see  how  many 
develop  lung  cancer.  This  is  called  a prospective  study. 

In  practice  this  can  be  a long  and  expensive  process;  thus, 
another  method,  called  a retrospective  study,  is  sometimes 


used. 


3 


In  a retrospective  study  we  take  a group  of  lung  j 

cancer  patients  and  a control  group  and  check  to  see  whether 
or  not  they  smoked.  We  can  express  both  studies  diagramati- 
cally  as  follows: 


Prospective  Study 


no  cancer 
w=0 


cancer 

w=l 


non-smokers  smokers 


u=0 

U=1 

P (w=0 | u=0) 

P (w=0 | u=l ) 

"oo 

nio 

"00  + "oi 

"lO  +"ll 

P (w=l | u=0 ) 

P (w=l | u= 1 ) 

"d 

"11 

"oo  + 7,01 

"10  + "ll 

Retrospective  Study 


non-smokers 

u=0 


smokers 


no  cancer 

w=0 
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where  in.  = I*  (u  = i,  w=j),  i,j=0,l  . The  parameter  space  is 

{"  = (%0'"01'7r10'"ll)|TT00+T'01+7r10+,Tll  = 1 ' 0swij  *1}  ; 
however,  due  to  the  nature  of  the  data  collection  methods 
we  may  obtain  meaningful  estimates  only  for  the  conditional 
probabilities  in  either  study. 

Henceforth  we  confine  attention  to  the  retrospective 
study.  Let  p^  = P(u=0|w=0)  , p^  = P(u=0|w=l)  and  r^ 

the  number  of  observations  in  the  ij  cell,  then  the  log 
likelihood  for  a retrospective  sample  is 

£(pl'p2)  = r00  ln  P1  + rl0  £n  (1_pl)  + r01  £n  p2  + rll  £n  (1_P2)  ‘ 
Now,  let 


n 1 ( 71 } "oo^oo  + n10 


and 


r,2U)  * "oi^Ol  +nll 


T 

The  range  of  n ( • ) = ( ( • ) « n 2 ( * ) ) is  the  unit  square. 

I (Hj ( • ) » h2  ( * ) ) is  the  log  likelihood  of  the  sample  as  a 
function  of  the  parameters  qq*  1 ' 71 10* 11 1 1 ^ * T^e  un^ue 
maximum  of  1 ( • , • ) on  the  range  of  n ( * ) is 

(roo/(roo  + rio) ' roi/(roi+rn))  ; thus  hyp°thesis  (i>  of 

Theorem  4 is  satisfied.  However,  n ( * ) is  a many  to  one 
function  for  each  value  of  its  range;  in  fact. 
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TlCn)  = n(it*) 

if  and  only  if  *10/*00  = *J0A$0  and  v ^/v  Q1  = ; 

and  thus,  we  will  not  be  able  to  obtain  a unique  maximum 
likelihood  estimate  for  tt  . 

We  might  wish  to  ask  what  parametric  functions  are 
uniformly  identifiable?  Consider 


h 1 ( ir ) *11/(*10  +*11}  " 7r01/(,00  + n01) 


and 


h2(u)  = S,n  (tii:l/ti10)  - Jin 


= £n 


n 


■n 


11*00 

10*01 


Now,  hj(7r)  is  the  difference  of  the  conditional 
probability  of  cancer  given  a person  smokes  and  the 
conditional  probability  of  cancer  given  a person  does  not 
smoke,  while  h2  ( ti  ) is  the  difference  of  the  log  odds  of 
the  two  conditional  probabilities. 

First, 


*11*00  ni (") 

h (ti)  = fcnt^  ■»»)  = £n(.— •— ,-r 
2 *10*01  1 ni<*) 


) - *n( 


n2  (") 

1 - n2 (^) 


thus,  from  Corollary  8,  h2 ( •)  is  uniformly  identifiable. 
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On  the  other  hand,  we  will  show  that 
0,_  = { 7i  | h . ( 7i ) =0}  is  a proper  subset  of  0 , and  hence,  h. 

hj  1 

is  not  uniformly  identifiable.  With  this  objective  in  mind 
suppose  = 0 and  write 


. . *01 ][01_ 

1 171  ' TT01  + 71  o o Y ^ 71 ) 77  m + 7T, 


01  00 


where 


1 ~ n l ( 7r ) 


n2(7T) 


Y ***  n^n)  1 - ti2U) 


Then,  h -^  ( tt ) =0  if  and  only  if  y ( tt  ) = 1 . From  Corollary 
8,  y ( * ) is  uniformly  identifiable;  thus  from  Theorem  15, 
if  nt77)  = hUq)  then  yf77)  = Y^q)  and  h]^77)  = ^(^q)  * 

Thus  h^  is  identifiable  at  ti^  . 

Now,  suppose  h.^?^)  1 0 • We  want  to  show  that  tt1 
is  not  in  0^  . Let  = (ttq0 ' ^01 ' 71 10' 71 1 1 ^ and 

77 c = <C7r00,7T01'C7T10,7Tll^  ' then  n(7Ic)  = ^"l*  and 
y <77c)  = Y(771)  • Now,  hx  (tt  x ) = h x ( 71  c ) says 


01 


01 


01 


01 


01 


+ C7i00Y(7T1)  71  n i +C* 


01 


00 


+77nnY<771)  I'm  + 7t 


01  00 


01  00 


c ( 1 - Y <771) ) 


1 - Y ( 77  ^ ) 


(7T01+C7,00Y(7T1))(7T01+C7,00)  " (n01+7,00Y(7,l,)(7701  + 7,00)  ' 


j 

t 
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and,  since  h^(n.)  / 0 , 7(11^)  7*  1 > so 


(7,01  +c1,ooY("in  (T,0]  +cvoo)  = c(7Ioi  +7,ooY(T,in  (7,oi  +7,oo] 


Thus,  if  h^(Ti^)  = h (nc)  , then 


"oi  "C(7,01  + 1,ooY(7,in  +c2t,ooy(71i)  = 0 


But  this  last  equation  cannot  hold  for  all  c (0  < c < 1)  , 

so  there  exists  some  c'  such  that  = n ( tt c ^ ) but 

hj  ( tt i ) / h^U^)  ; thus,  from  Theorem  15,  h^  ( • ) is  not 
identifiable  at  . 

Example  7 (The  Projectile  Example).  In  all  the 

preceding  examples  the  data  has  been  discrete.  In  this 

example  we  consider  a problem  involving  continuous  data. 

Let  X ^ (i  = l,...,n)  be  independent  and  identically 

distributed  observations  of  the  distance  traveled  by  a 

projectile  fired  at  elevation  0 and  initial  velocity  v . 

Assuming  no  air  resistance,  the  distance  traveled  is  given 
2 

by  v sin  20/g  , whore  g is  the  gravitational  constant. 

Suppose  the  XVs  are  exponentially  distributed  with 

2 

mean  v sin  20/g  . 

Except  for  an  additive  constant,  the  log  likelihood 


A 


can  be  written  as  M'l(v,0))  where 
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n 

£ ( A ) = ( T.  x ■ ) A - n Jin  A , 
i=l  1 

and 


n(v,0)  = g/v^  sin  20  . 

The  parameter  space  is  { (v,0)  | 0 is  in  (0,ir/2)  and  v is 
in  ( 0 ,°°)  } . 

£(•)  has  a unique  maximum  on  (0,°°),  the  range  of 
n(v,0)  , in  fact 


n 

A = n/  X x.  • 
i=l  1 

The  function  n(*,*)  is  not  one  to  one,  so  there 
is  no  unique  maximum  likelihood  estimate  of  (v,0)  . 

Let  us  now  consider  the  first  component  of  the 
terminal  velocity  of  the  projectile,  a function  of  the 
parameter  which  might  be  of  interest.  Thus,  h^(v,0) 

= v cos  0 . We  show  that  h^(*)  is  not  uniformly  identifi- 
able by  proving  0,  c { (v, 0)  | v > 0,  0 = tt/4  } . 

n3 

Let  (v*,0*)  be  in  0.  , we  first  show  that 

n 


{ (v, o)  | n (v, 0)  = n (v*, o*) } = { (v*, 0*) } . 

Suppose  n (v', 0 ')  = n(v*,0*)  , then  from  Theorem  15,  since 

(v*,0*)  is  in  0.  , h_(v',0')  = h_(v*,0*)  . Thus, 

n J J 


4 
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(v')^sin20'  = (v*)^sin2Q* 

and 

v"  cos  0'  = v*  cos  G*  . 

Now,  ' 

v " . 2 _ sin  20*  _ sin  6*  cos  6* 
v*  sin  20^  sin  6 ' cos  O' 

and 

v'  cos  0* 

v*  cos  0 " ' 

which  implies 

v " _ sin  0*  _ cos  0 * 
v*  sin  0 " cos  0 “ ' 

and 

sin(O*-0')  = sin  0*  cos  6 ' - cos  0*  sin  0 ' = 0 , 

or,  since  d'  and  0*  are  in  ( 0,v/2 ),  0'  = 0*  . Also, 

v'  = v*  . Therefore , {(v,0)  | n(v,0)  = n(v*,0*)}c{(v*,0*)}  . 

The  reverse  containment  is  obvious. 

But  this  equivalence  of  sets  implies  that  0*  = u/4  , 
since  if  0*  < tt/4  , let  0^  = v/4  + (tt/4  - 0*)  and  v*  = v^  . 
Hence  n(v*,0*)  = n(v^>0^)  but  0*  / 0^  . A similar 
argument  holds  if  0*  > tt/4  . 
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6.  IDENTIFIABLE  SETS  AND  TESTABLE  HYPOTHESES 

6 . 1 Identifiable  Sets 

In  Chapter  5 we  considered  the  problem  of  making 
inferences  in  a family  of  form  (3),  where  n ( • ) is  not  one 
to  one.  As  an  example,  we  looked  at  the  normal  linear  model 
of  less  than  full  rank,  considering  the  concepts  of  estimable 
functions  and  testable  hypotheses  which  have  been  developed 
for  this  particular  case.  We  then  showed  that  the  uniform 
identif iability  of  Koopmans  and  Reiers^l  (1950)  is  a generali- 
zation of  the  concept  of  estimable  functions. 

In  this  chapter  we  generalize  the  concept  of  testable 
hypotheses . 

Let  Y be  a sample  with  density  f(y;0)  . A subset 

S of  0 is  called  identif iable  if  and  only  if  0^  in  S 
and  f ( • ; Q ) = f ( - ; G q ) implies  6 in  S . We  now  consider 
some  basic  properties  of  identifiable  sets. 

Let  Ig(*)  be  the  indicator  function  of 
S(Is(0)  = 1 for  0 in  S , 0 elsewhere). 

Lemma  7 . S is  identifiable  if  and  only  if  Ig(*) 
is  uniformly  identifiable. 

Proof.  The  statement  "I„(0,J  = 1 and  f { * ; 0 ) = 

" b U 

f < • ; 0 q ) implies  lg(0)  = 1"  is  equivalent  to  "f(*;0)  = 
f(*;0Q)  implies  lg(0)  = Ts  < 6 0 ) •" 


51 

As  an  immediate  consequence  of  Lemma  7,  certain 
results  for  identifiable  functions  also  hold  for  identi- 
fiable sets.  In  particular,  if  Y is  a random  sample  from 
some  density  then,  by  applying  Lemma  7 and  then  Theorem  12, 
the  collection  of  identifiable  sets  is  the  same  for  all 
sample  sizes,  and  we  may  check  for  identif iablity  of  a 
particular  set  at  a particular  sample  size  by  checking  at 
sample  size  1. 

Now,  for  m in  the  range  h(*),  let  Sm  = {0|h(G)  = m}. 

Lemma  8.  S is  identifiable  if  and  only  if  h(*) 

m 

is  identifiable  at  0 for  all  0 in  S 

m 

Proof.  The  statement  "9.  e S and  f(*;0)  = f(*;0_) 
Dm  0 

implies  0 e S^"  is  equivalent  to  "f(*;0)  = f ( * ; Q ^ ) implies 
h(0)  = h ( 0 Q ) , for  all  0q  e Sm  *" 

Corollary  10.  S is  identifiable  for  all  m in 
h(0)  if  and  only  if  h(*)  is  uniformly  identifiable. 

Example  8 (Retrospective  Study  - Continued) . In 
Example  6 it  was  shown  that  h^ ( • ) is  not  uniformly 
identifiable.  It  was  also  shown  that 

0.  = {it  |h  (it)  = 0}  . 

1 

Thus,  for  h^(.)  , Sm  is  identifiable  for  m = 0 but  is 
not  identifiable  for  any  other  value  of  m . 
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6 • 2 Identifiable  Sets  and  the  Exponential  Family 

Let  g(x;0)  be  a density  of  form  (3)  and  S be  a 
subset  of  0.  Throughout  this  section  we  will  restrict  our 
attention  to  g(x;0)  in  canonical  form. 

Theorem  19.  Suppose  g(x;0)  is  in  canonical  form, 
then  S is  identifiable  if  and  only  if  0Q  in  S and 
h(0)  = n(0Q)  implies  0 is  in  S . 

Proof.  Apply  Lemma  6 and  the  definition  of  "S  is 
identifiable. " 

Corollary  11.  S is  identifiable  if  and  only  if 
ri  (S)  n n (SC)  = <t>  . 

Proof.  The  statement  "n(S)  nri(SC)  = is 

equivalent  to  "0o  in  S and  n(0)  = n ( ©Q ) implies  0 is 
in  S 

Let  C,  be  a subset  of  n(0)  then  n_1(G)  will 
denote  {0 | n (0)  is  in  G)  . 

Theorem  20.  Suppose  S is  identifiable  and  R is 
not,  then 

n~l(n(SnR))  = s n n_1(n(R))  . 

Proof.  The  statement  "n_1 ( n (S  n R) ) c S n p_1 ( n (R) ) " 
is  equivalent  to  "n(0)  = n(0Q)  and  0Q  in  SnR  implies 
is  in  Son  (n(P))"»  which  is  true  from  Theorem  19  and 


0 
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the  definition  of  n 1(n(R)). 

The  statement  "n  ^(n(SnR))  2 Snn  ^(n(R))"  is 
equivalent  to  "ri(0)  = n(0Q)  i 0 in  S and  0Q  in  R implies 

8 in  n ^ ( ri  ( S n R ) ) " , which  is  true  from  Theorem  19  and  the 
definition  of  n * (n (S  n R)  ) . 

< 

Theorem  21.  Let  7 be  the  collection  of  identifi- 
able subsets  of  0 . 7 is  closed  under  intersection,  union 

and  complement. 

Proof . Let  S and  S^  , A in  A , be  in  7 . 

First,  0 S.  is  in  7 . Suppose  0_  is  in  f)  S, 

AeA  A 0 AeA 

and  n (0)  = h(0q)  • Then,  for  all  A in  A , GQ  is  in 

and  n ( 0 ) = ntOf.)  • Thus,  from  Theorem  19,  0 is  in  S 

U AeA  A 

and  0 s is  identifiable. 

AeA  A 

Second,  US,  is  in  7 . Suppose  0n  is  in 

AeA 

U S,  and  n (0)  = n(Gn)  • Then,  for  some  A in  A , 0_ 

A.  A U U 

eA 


is 

in  S^  and 

n (0)  = 

h(0q)  ; thus,  from  Theorem  19,  0 is 

in 

U S,  and 

U s. 

is  identifiable. 

AeA  A 

AeA  A 

Finally, 

as  a 

direct  consequence  of 

the  definition 

of 

"S  is  identifiable 

" we  see  that  Sc  is 

identifiable . 

Let  h^ ( • ) , i * 1 , . . . , £,  be  functions  of  0 . Using 
Theorem  21  we  may  extend  the  results  of  Lemma  8 as  follows. 
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Theorem  2 2.  Let  S = {9|h.(0)=m.}  , i = 1 , ,9.  . 

m . 1 l l 

J l 1 

C\  s is  identifiable  for  all  (m,  , . . . ,m., ) , m.  in 
. m . l X.  x 

i=  1 l 


h ^ ( 0 ) , if  and  only  if  Ik  ( • ) is  uniformly  identifiable 

for  all  i . 


Proof . Suppose  f)  S is  identifiable  for  all 

i=l  mi 

(m^, . . . ,mf ) . We  show  that,  for  every  i,  S^  is  identi- 

i 

fiable  for  all  rtK  in  Ik  (0)  ; thus,  from  Corollary  10, 

Ik  ( • ) is  uniformly  identifiable  for  all  i . Let  1 < j s 9.  , 

m'  be  a f ixed  value  of  m . , and  A . = {(m,,...,m  ) I m . = m?  } . 
3 3 3 1 n 3 3 

9. 

Now,  since  D S is  identifiable  for  all  (m, , . . . ,m„ ) , 

, m . 19. 

i=  1 l 

from  Theorem  21, 


U 


9. 

n s 


(m^ , . . . ,my ) eA j i=l 


m . 


is  identifiable.  However, 


(m.  , 


U 


9. 

r>  s 


. . ,m  ) eA  . i=l 
3 


m . 


= S 


m . 
3 


= S_-  , 
*1 


/ * 

\\ 

U Sm 

b 

piehi  (0)  i]  l 

since,  for  any  i , 


identifiable. 


U S 

miehi(0)  mi 


= 0 


So,  S - is 

m. 
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Conversely,  suppose  ( • ) is  uniformly  identifiable 

n 

for  all  i . Then,  from  Corollary  10  and  Theorem  ?\,  C\  s 

i=  1 mi 

is  identifiable  for  all  (m^,  . . . ,m  ) . 

6 . 3 Generalizing  Testable  Hypotheses 

Let  S = {g|ATg=nu,  i = l,...,£}  where  g and  o2 

are  the  parameters  of  a normal  linear  model.  Now, 

n (S)  = { (gTXT/<J2i-l/2o2)  | g is  in  S,  o2>0}  ; thus, 

ri(S)  ng(Sc)  = if  and  only  if  XS  n XSC  = ^ . Therefore, 

from  Theorem  8 and  Corollary  11,  S is  identifiable  if 
T 

and  only  if  A^g  is  estimable  for  all  i , so  the  hypothesis 
H , stating  that  g is  in  S , is  testable  if  and  only  if  S 
is  identifiable. 

We  generalize  the  concept  of  a testable  hypothesis 
as  follows:  Let  Y be  a sample  with  density  f(y;0)  . 

The  hypothesis  II  , 0 in  S , is  testable  if  and  only  if  S 
is  identifiable. 

To  see  the  importance  of  this  definition,  let  ~H 
(read  not  H)  state  that  0 is  in  Sc  . If  S is  not 

identifiable  then  there  is  a 0^  in  S and  a 0^  in 

Sc  such  that  f ( ■ ; 0 q ) = f(*j0^)  ; thus,  the  set  of 
distributions  under  H and  ~H  are  not  disjoint.  This 
means  that  based  on  observed  Y we  cannot  say  whether  H 
or  ~H  is  true. 


Certain  hypothesis  testing  results  for  the  normal 
linear  model  will  extend  to  densities  g(x;0)  of  form  (3) 
in  canonical  form. 

Searle  (1971)  shows  that  in  the  normal  linear  model 

the  sum  of  squtires  used  in  testing  a hypotheses  with  some 

( 

estimable  components  and  some  non-estimable  components  is 
the  same  as  the  sum  of  squares  for  testing  that  same  hypothe- 
sis but  with  the  non-estimable  components  deleted.  We  may 
extend  Searle 's  result  in  the  following  way. 

Suppose  that  S and  R are  both  subsets  of  0 , 

S being  identifiable  and  R not.  Let  H state  that  0 
is  in  S n R and  ll'  state  that  0 is  in  S n n ^ (n  (R)  ) . 

Theorem  24.  The  maximum  likelihood  ratio  statistic 
testing  H versus  ~H  is  the  same  as  testing  H'  versus 
~H' . 

Proof . Let  x^#...fx  be  an  observed  sample  from 
g(x;0)  . The  maximum  likelihood  ratio  statistic  testing 
H versus  ~H  is 

n 

sup  ti  g (x  . ; 0) 

QeSnR  i=l  1 
n 

sup  v g (x  . ; 0) 

0eO  i=l  1 


From  Theorems  3 and  20 
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n n 

sup  7i  g (x • ; 0)  = sup  ti  f (i|/(x  . ) ; n (0) ) 

OtSnR  i= 1 OcSnR  i=l  Y 

n 

= sup  ti  fv(iMx.  ) ; n (0)  ) 

-1  i=l  Y 1 

Ocn  n (SnR) 

' n 

= sup  ti  gfx^Q) 

OcSriri”1  (n  (R)  ) 1 = 1 

which  is  the  numerator  of  the  maximum  likelihood  ratio 
statistic  testing  H ' versus  ~H'.  The  denominator  is  the 
same  for  testing  both  hypotheses. 

To  see  that  Theorem  24  is  truly  a generalization  of 
Searle  (1971),  let  S ={B|A^B  = c^,  i = l,...,j}  and 
R = (B|A^B  = c^ , i = j+1, . . . , £ } where  A^B  is  estimable 
for  i = l,...,j  , and  not  estimable  for  i = j + l,...,£  . 

From  Theorem  24  the  sum  of  squares  for  H versus  ~H  is 

the  same  as  the  sum  of  squares  for  H'  versus  ~H'  . Now, 

n_1 ( n (R) ) = f (BTlo2)T  | 6 is  in  X-1(X(R) ) and  o2  > 0}  . 

From  Theorem  9,  X *(X(R))  = EP  ; thus,  n *(n(R))  = 0 , so 

S n r)  1 (n  (R)  ) = S . Thus,  the  sum  of  squares  for  testing  B 

in  SnR  is  the  same  as  the  sum  of  squares  for  testing  B 
in  S . 


58 


Hxamp  ] ij  0 (I.ogi  .Stic  I.i  f e .Study  Model  - Continued)  . 
Thompson  (1976)  discusses  the  use  of  covariates  in  the 
analysis  of  life  table  data,  introducing  a logistic  model 
for  the  conditional  probability  of  failure  in  a time 
interval  given  survival  to  the  beginning  of  the  interval. 

t 

An  example  is  given  using  the  following  data: 

Table  1 

Times  of  Remission  (weeks)  of  Leukemia  Patients 
(Gehan  (1965),  from  Freireich  et.  al.) 

Sample  0 6* , 6, 6, 6, 7, 9*, 10*, 10, 11*, 13, 16, 17* 

(drug  6-MI’)  1 9* , 20* , 22 , 23 , 25*  , 32* , 32* , 34 * , 35* 

Sample  1 1,1,2,2,3,4,4,5,5,8,8,8,8,11,11, 

(control)  12,12,15,17,22,23 

♦Censored 

Here,  the  covariate  effect  is  containment  in  sample  0 or 
sample  1.  The  conditional  probability  that  individual  i 
fails  in  interval  j given  survival  to  the  beginning  of 
the  interval  is  represented  as 

1 + *xp(zij060  *zijle1*n)) 

where,  Z.  =1  if  the  it*1  individual  is  in  the  kth  class, 

l jk 

0 otherwise.  (3^  and  are  control  and  treatment  effects 

respectively  and  rij  is  the  effect  of  the  jth  interval. 
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We  wish  to  test  the  hypothesis  H*  , 6^  - 8^  , 

equal  drug  and  control  effect.  Let  S = {6|8q  = 3^}  , 

T 

here  8 = (Bq#  8j_#  nj_#  • . • » n^)  • From  Lemma  8,  S is 

identifiable  if  and  only  if  Bg  ~ 8^  is  identifiable  at  8 

for  all  8 in  S . The  density  of  the  logistic  life  study 

model  is  of  form  (3),  in  canonical  form,  n(3)  = Z8  , 

SL  + 2 

and  Q = E , thus,  from  Corollary  9,  8q  - 8-^  is 
identifiable  at  8 if  and  only  if  8Q  - 8^  is  uniformly 
identifiable.  From  Corollary  8,  8g  - 8-^  *s  uniformly 
identifiable  if  and  only  if  ( 1 , -1 , 0, . . . , 0 ) is  in  Row(Z)  . 
This  will  be  the  case  if  there  is  any  interval  with  at  least 
one  member  from  each  class  at  risk  in  the  interval.  The 
data  in  Table  1 shows  21  members  from  class  0 and  21  members 
from  class  1 at  risk  in  the  first  interval.  Thus  H*  is 
testable . 

We  show  that  the  likelihood  ratio  test  of  H*  is 
the  same  as  that  of  H , 8g  = 8^  = 0 . Write 

{8|80  = B1  = 0}  = (8|B0  ~81  = 0}  n {8|80  + 8:  = 0}  . 

Now  (1,1,0, ...,0)  is  not  in  Row(Z)  , Bg  + 8^  *s  n0t 
uniformly  identifiable,  and  {B|8q  + 3^  = 0}  is  not  identi- 
fiable. From  Theorem  24,  the  likelihood  ratio  test  of  H 
is  the  same  as  that  of  H'  , 8 is  in  (8|8q  — 8^  = 0}  n 
Z_1(Z(8|B0  + fij_  « 0})  . But  Z_1(Z{3(80  + Bi_  = 0})  =E*+2 

so  H'  = H = H*  . 
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We  prove  this  last  by  showing  that 
Z{&|$g+3i/0}  £ Z f 0 | + 0^  = 0 } . Let  &*  be  such  that 

B;  + B;/0  . There  is  0 + , with  0*  + 0^  = 0 , such  that 

20^  = 20*  . Consider  the  equations,  in  the  variable  0 , 

R0  + 3i  = 

and 

Z0  = " . 

These  equations  have  a solution,  0',  since  ( 1,1 ,0, . . . , 0)  is 
not  in  Row ( Z ) . 0 + = 0*+0'  . 

In  summary,  we  have  shown  that  H*,02  = 0^  is 

testable  and  that  its  likelihood  ratio  test  is  the  same  as 

that  for  H , 0Q  = 0-j,  = 0 . 


4 


i 


i 
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7 . CONCLUSION 

We  have  considered  problems  of  identification  which 
arise  in  making  inference  about  the  exponential  family  when 
the  density  is  not  in  one  to  one  correspondence  with  the 
parameter  space.  Such  problems  logically  precede  all 
questions  of  inference.  Using  data  we  cannot  hope  to 
distinguish  between  two  parametric  values  corresponding 
to  the  same  density. 

One  can  assume  this  problem  away  by  a reparameteri- 
zation, but  in  doing  so,  the  physical  meaning  associated 
with  the  parameters  might  be  lost. 

As  a guiding  example  we  considered  the  normal  linear 
model  of  less  than  full  rank  discussing  the  concepts  of 
estimable  function  and  testable  hypothesis.  Many  of  the 
classic  properties  proved  there  have  been  extended  to  the 
general  exponential  family  through  the  ideas  of  uniformly 
identifiable  function  and  identifiable  set. 

These  general  ideas  are  illustrated  with  several 
numerical  and  computational  examples:  i)  a Poisson  model 
for  the  analysis  of  some  data  on  the  survival  of  bacteria 
after  radiation,  ii)  a logistic  life  study  model, 
iii)  analysis  of  a retrospective  study  of  cancer  and 
smoking  and  iv)  a physical  example  involving  terminal 
velocity  of  a projectile.  It  is  found  that  some  parametric 
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questions  simply  cannot  be  answered  from  data,  for  the 
data  contains  no  information  about  them,  and  sometimes  two 
questions  cannot  be  distinguished  from  one  another  using 
data.  Other  parametric  questions  can  reasonably  be  asked 
and  answered  in  a data  analysis  sense. 
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